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ABSTRACT 


The geoscience and chemistry communities have numerous common practices and dependency on data 
standards. Recent efforts from the International Union on Pure and Applied Chemistry (IUPAC) and the 
American Geophysical Union (AGU) are to explore and collaborate on approaches and sharing lessons 
learned on efforts to implement the FAIR Guiding Principles as they apply to data in their respective 
communities. This paper summarizes their efforts-to-date highlighting the importance of existing communities, 
Scientific Unions, standards bodies and societies in taking deliberate steps to move and encourage researcher 
adoption of the FAIR tenets. 
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1. SCIENTIFIC DATA ARE VALUABLE RESEARCH OBJECTS 


Researchers in the geosciences and chemistry often find data valuable to their work by identifying 
relevant articles and gaining access through the supplementary information, or by requesting the data from 
the authors. Few researchers place their data in a trusted repository, openly sharing and linking it to their 
article using a citation. Data in the supplement tend neither to be well-documented nor indexed for 
discovery. In a recent study of papers in Science, the authors analyzed 180 papers to determine accessibility 
of data and code [1]. Only 13% of the papers included the data and code necessary to attempt reproducing 
the research. It is important to note that the data policy for Science during the period reviewed required 
that the authors make their data and code available to requestors. With some level of work, only 36% of 
all the authors of the papers in the study shared the data and code that supported their research. This 
represents a fraction of the authors being compliant with the requirements of the journal’s data policy. 


These numbers indicate that our scientific data are at high risk of being unavailable or lost, with the 
probability increasing over time. The data and code that underpin research and the scholarly record are 
very important for the integrity, transparency and reproducibility of science. Our data must be treated as a 
valuable research object. It must be FAIR for humans and machines, where FAIR stands for Findable, 
Accessible, Interoperable and Reusable. 


Cross-domain research teams are challenged to find data from domains other than their area of expertise 
to meet the needs of their research. The FAIR Data Guidelines [2] provide guidance for all domains on how 
data can be better managed and preserved to maximize state-of-the-art automated workflow to support the 
scientific record more accurately at a larger scale. The desired result is that data are easier to discover, 
well-documented, and reusable in future research, for both researchers and expert systems. 


The data needed to address such complex interdisciplinary scientific questions and the problems of the 
future will benefit from common guidelines and best practices that all researchers follow to help each other 
navigate the complexity of our world through data. This includes standards that are well-adopted, well- 
implemented and managed, and endorsed through bodies such as the long-standing Scientific Unions and 
other professional organizations. Increasingly they have a new role in recommending the authoritative 
source of this information and in collaborating with other Unions on vocabularies and best practices that 
can be used by multiple domains [3]. 


Two organizations working hard to leverage each other's good work are the International Union of Pure 
and Applied Chemistry (IUPAC) and the American Geophysical Union (AGU). Both are celebrating their 
centennial in 2019 and have been actively engaging over the last year to explore their common challenges 
and support each other in taking steps toward more open and FAIR data. These celebrations are timely 
opportunities to establish goals that acknowledge the future by providing even better support to researchers 
and the many types of research products that constitute the scientific record, not least of which is data. 


IUPAC was established in 1919 as a neutral and objective international scientific organization to formulate 
a common language for chemistry and provide expert guidance on community processes and procedures 
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for standards development. They are a recognized member of the International Science Council (ISC). The 
IUPAC vision is to be “an indispensable resource for chemistry”, through its mission, “to provide objective 
scientific expertise and develop the essential tools for the application and communication of chemical 
knowledge for the benefit of humankind and the world”®. Standard recommendations and technical 
guidance are published in IUPAC’s flagship journal, Pure and Applied Chemistry, along with a number of 
specific terminologies in various sub-disciplines through the Color Book program, as well as a number of 
evaluated data collections®. 


AGU is both a professional society with 60,000 members in the Earth, space, and environmental sciences, 
and a society publisher with 22 peer-reviewed journals. The AGU vision is to “galvanize a community of 
Earth and space scientists that collaboratively advances and communicates science and its power to ensure 
a sustainable future”. In AGU’s 2019 centennial celebration and programming the organization is focused 
on supporting the advancement of Earth and space science while providing a platform to broaden and 
deepen engagement within and outside the Earth and space science community. 


2. OPEN AND FAIR DATA 


AGU has recently convened a community effort to make data open and FAIR through the Enabling FAIR 
Data project [4] funded by Arnold Ventures [formally the Laura and John Arnold Foundation]. In a separate 
effort, IUPAC has been working toward similar goals since 2014. 


Researchers from the two communities share expertise in thermodynamics, petroleum, geochemistry, 
solubility, toxicology and element signatures to name just a few. They are both challenged with how best 
to establish common vocabularies, metadata best practices, and formats with formal structures that 
sustainably support these areas of expertise. They are further challenged with limitations on the number of 
repositories and the amount of curation support available to preserve the large and complex body of 
supporting data as part of the scientific record. Researchers in both communities face the burden of 
nonexistent or unaligned guidelines from funders, publishers, and others on what is required for data 
management and preservation in these disciplines. 


3. ESTABLISHING A FAIR COLLABORATION 


AGU and the Enabling FAIR Data community have put into place a Commitment Statement [5] that 
designates for members of different stakeholder communities, such as journals, repositories, and researchers, 
their role in enabling open and FAIR data. The primary goal is to move data that supports research out of 
the supplementary information and into a trusted repository where it can be discovered and well-documented 
separately from the article. In this effort the focus was mostly on the “F” and “A” of FAIR with additional 


© https://iupac.org/who-we-are. 
®  https://iupac.org/what-we-do. 
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work needed to firm up the approach to “I” and “R” working with GO FAIR and other science communities 
with intersections in the geosciences. 


In 2014, IUPAC identified the need to help facilitate a consistent global framework for Human AND 
Machine-readable (and “understandable”) chemical information in collaboration with other science 
communities, industries and governments. This vision was articulated by one member as “Digital IUPAC” 
[6], and was incorporated into the Terms of Reference of the IUPAC Committee on Publications and 
Cheminformatics Data Standards, to “advise [IUPAC] on all aspects of the design and implementation of 
publications and data-sharing, ... and to promote the compatibility of the electronic transmission, storage, 
and management of digital content through the development of standards...” [7]. A subcommittee was 
established in 2016 to explore the cheminformatics data standards needs of the chemistry community, 
coordinate expertise within IUPAC, and prioritize international activity through collaborative efforts with 
the Research Data Alliance (RDA) and the Committee on Data (CODATA-ISC), among others. In 2018, 
IUPAC worked with community members to establish a nascent Chemistry Implementation Network 
(ChIN)® within the GO FAIR Initiative, and officially endorsed the manifesto in early 2019 [8]. 


In the collaboration efforts between AGU, IUPAC, and their respective broader communities, we can 
begin to agree how our vocabularies and standards are related. For example, in the field of geochemistry, 
vocabularies and standards that describe a rock or mineral that is chemically analyzed can come from the 
International Union of Geological Sciences (IUGS), whilst the chemical properties can come from data 
initiatives associated with the Periodic Table stewarded by IUPAC [9]. 


4. IMPORTANCE OF SCIENTIFIC COMMUNITIES, UNIONS AND SOCIETIES 


The scientific ecosystem has many stakeholders that must work together to make incremental changes 
toward significant goals. The tenets defined in the FAIR Guiding Principles have been a tool for convergence 
by most, if not all, of the stakeholders throughout the history of scientific research. Communities, Unions 
and Societies recognize the importance of their roles in promoting improvements in scientific communication 
and being drivers behind the changes needed. 


AGU has a long-standing partnership with both Earth Science Information Partners (ESIP)®, and more 
recently the RDA® to further improve how scientific data are managed. Through ESIP and RDA research 
communities bring their ideas to collaborate across the sciences in an international setting. 


The geoscience standards bodies include the Open Geospatial Consortium (OGC), the IUGS, and more 
broadly, the International Organization for Standardization (ISO). Standards bodies endorse vocabularies, 


® https:/Awww.go-fair.org/implementation-networks/overview/chemistryin/. 
®  https:/www.esipfed.org. 
® https:/Awww.rd-alliance.org. 
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definitions and other supporting information necessary for well-governed geostandards. Through their 
leadership, the way we document our interactions with our research has a common language for better, 
more-accurate understanding. 


IUPAC is regarded by the chemistry community as the world authority on chemical nomenclature and 
terminology, standardized methods for measurement, atomic weights and many other critically-evaluated 
data. This authority is established through the participation of National Adhering Organizations (NAOs) and 
companies in those bodies of the union responsible for formulating, ratifying and curating the standard 
recommendations, including many chemical societies and national standards agencies. 


The scientific unions belong to the ISC and many participate in the ISC’s CODATA. CODATA is an 
advocate for the FAIR Guiding Principles through recent efforts by Simon Hodson, CODATA Executive 
Director, who chaired the European Commission’s Expert Group on FAIR Data® and published, Turning 
FAIR into Reality: Final Report and Action Plan from the European Commission Expert Group on FAIR Data 
[10]. 


5. HOW “I” BRINGS US TOGETHER 


This collaboration between chemistry and the geosciences has led to a better understanding of our 
strengths in data management and an opportunity to share our experiences and approaches. We are engaged 
at two levels. First, the general implementation of FAIR, community organization, awareness and encouraging 
adoption. Second, a much deeper level of coordination and collaboration on the interoperability of data 
that is generated and used by both the chemistry and geoscience communities. 


Implementation of FAIR: During a recent presentation at the National Meeting of the American 
Chemical Society (ACS) in April 2019, Leah McEwen and Shelley Stall described the approach taken by 
the Enabling FAIR Data project as compared to that of IUPAC in considering implementation of FAIR. 


The Enabling FAIR Data project started from the premise that to achieve findable data, we need common 
guidelines for all scholarly journals, scientific repositories and funders. We also need our researchers to 
deposit their data in trusted repositories that support the FAIR principles. With certifications like 
CoreTrustSeal® and the use of persistent identifiers, this is an area where our repository communities can 
adopt existing best practices. Further, to encourage and ensure this behavior takes place, we need our 
journals and funders to implement common guidelines for our research data. The Enabling FAIR Data 
project is working with journals on adoption of the project's common author guidelines. Some funders such 
as Wellcome Trust [11] and the Belmont Forum [12] have put into place clear guidance on their requirements 
for open data and other digital objects. 


® http:/www.codata.org/working-groups/fair-data-expert-group. 
®  https://(www.coretrustseal.org. 
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As a long-standing standards body, IUPAC has been supporting the function of a common language and 
standard practices for communicating chemical information for a century. As we assess these outputs and 
how they can be translated toward FAIR practices, we appreciate that IUPAC efforts toward authoritative 
community-wide standards have always been grounded in the needs of the greater scientific community 
to be able to reuse, compile, and analyze interoperable chemical data of measurable quality. Looking 
forward towards a more “Digital IUPAC”, we have aspirations for improving accessibility and findability of 
chemical data across the globe, across sectors and across disciplines. Some mechanisms developed to 
support machine-accessible interoperability across the community, such as the International Chemical 
Identifier (InChI)®, can be further applied toward metadata schema to improve discovery of chemical data 
by other systems more broadly. However, we are most challenged in realizing wider dissemination of 
chemical data through lack of sustainable and scalable technical expertise and infrastructure to manage 
these processes. 


During our ongoing collaboration, we continue to discuss how we can help each other achieve mutual 
goals using the FAIR Guidelines as the framework. Approaching FAIR with somewhat different emphasis 
and priority suggests several areas where our progress on these principles can complement and build on 
each other, furthering our collective efforts towards multi-domain interoperability. A brief summary: 


Findable — More guidance is needed on implementing FAIR metadata and other criteria at all levels of 
describing data objects, from DOls to trusted repositories. The efforts within the Enabling FAIR Data project 
to recommend the CoreTrustSeal certification for repositories in the geosciences could inform similar reviews 
of repositories in chemistry. 


Accessible — Data are made accessible through the services provided by the selected trusted repository. 
The value of a trusted repository to the FAIR principles is paramount, as demonstrated in the Enabling FAIR 
Data project. While there are some specialized data repositories in chemistry, very few options that provide 
appropriate services exist for chemical data more generally. Outside the US there are few specialized data 
repositories for geoscience data making adequate curation of data difficult. 


Interoperable — Chemistry, and specifically Crystallography, could be considered important examples of 
interoperable data. These communities have developed exemplar data information formats specific to their 
data types [13]. Within the geosciences, the IUGS has led global interoperability of geoscience data since 
2004. Similarly, in geophysics there have been global standards [14] for more than 30 years. This provides 
solid ground work for building interoperability within and across chemistry and the geosciences. 


Reusable — The ability of researchers and other stakeholders to reuse data depends on many factors, 
including adequately-documented provenance, domain specific metadata, and licensing information. 
Several stakeholders may be involved in contributing to this guidance, including standards bodies, publishers 
and trusted repositories. 


https://www. inchi-trust.org. 
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Coordination and Collaboration on Interoperability: Within FAIR, “I” or Interoperability is complex. 
Agreement on what that means needs to happen in local communities, across a domain, internationally by 
domain, and then cross-domain. One example of a domain working hard on being FAIR and interoperable 
is that of the ocean sciences [15]. Researchers in this domain have a driving need to use data collected 
from the many funded scientific vessel cruises to use in larger research efforts. They are implementing 
semantic standards and have a growing worldwide network. Their next step is to move toward cross-domain 
interoperability that will require a well-defined and described vocabulary that is inclusive of what is 
currently adopted by research communities and mapped to other relevant vocabularies. Practices need to 
be put into place that encourages the use of common vocabularies to maintain their value and the authority 
of their use. Cross-domain interoperability needs common formats for data so that it can easily be pulled 
into tools used by all communities. 


The chemical information community has been striving toward this reality through various efforts to 
“translate” and “harmonize” chemical nomenclature, other terminology and chemical data reporting 
standards into digital formats. Building on IUPAC’s authoritative scientific definitions, their goal is to develop 
machine readable technical descriptions to facilitate accurate reporting and the exchange of chemical data 
from system to system and further scientific analysis and informatics processing. See the Gold Book 
Compendium of Chemical Terminology development project for an early phase example [16]. As with 
communicating data between human experts, common units of meaning and modes of expression are 
necessary to accurately define the context of chemical data for use in expert computer systems. Understanding 
how other disciplines refer and describe chemicals in their research to help build these bridges across 
domains and use cases is an emerging goal for IUPAC in the coming year. 


Broader adoption of community-used vocabularies is needed, but even more so, we need the entire 
research process to be well intertwined with good data management (Figure 1). Each of these elements 
needs to be aligned and there are very few examples of domains that are doing this well. This includes 
data management plans, field notes, instruments, lab notebooks, data and sample preparation, analysis, 
modeling, data transformations, visualizations, archiving and preservation. 
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Figure 1. Research processes that need to be intertwined with good data management practices. 


There are different entities responsible for nearly each step of this simplified process including commercial 
software, instrument manufacturers and industry. Further, the process in reality is not linear, but more 
iterative. 


6. PROGRESS IN IMPLEMENTING AND SUPPORTING FAIR 


Outreach and Awareness: Communication of new recommendations and guidelines is important for 
awareness and adoption. For the Enabling FAIR Data project, papers on the outcomes of each stakeholder 
meeting [4, 17, 18] were published, along with a useful guide for reviewers and editors on the author 
guidelines data [19]. 


As part of its outreach efforts to monitor cheminformatics needs and raise community awareness, IUPAC 
is engaging in a series of symposia and workshops worldwide in collaboration with other chemistry and 
data organizations, including the ACS, the Royal Society of Chemistry (RSC), the International Union of 
Crystallography (IUCr), RDA, CODATA and several additional Chemical Societies and organizations in 
Europe and on the Pacific Rim, among others. Outcomes have appeared in a number of reports and articles 
[20] and have led to the identification of a number of key areas of activity to support human and machine- 
readable exchange of chemical data [21]. 
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Training: Educational and training resources are becoming available on the FAIR Data Principles 
generally, but also on how the FAIR Data Principles can be applied to the data from different domains, and 
on various tools that can be used to help make your data FAIR. The ESIP-hosted Data Management Training 
Clearinghouse (DMTC) [22] is a continually growing, curated registry of information about existing 
educational and training resources ranging from full online courses for credit to short online tutorials, video 
presentations, and learning activities. The educational and training resources in the DMTC can be accessed 
by means of both search and browse functionality, including the use of filters or facets that can help you 
find appropriate resources for your needs more precisely. As an example, a key access point into the training 
resources available on the FAIR Data Principles is to use the “framework” filter to see the list of resources 
currently available that discuss in whole or in part, one or more of the FAIR Data Principles (Figure 2). 
Creators of educational and training resources are encouraged to submit their resources to the DMTC for 
publication using the “Submit” button on the menu bar and/or the landing page. 


Data Management Training Home Browse seach submit Hep- — About 


Search 


Overview of Interdisciplinary Earth Data Alliance (IEDA) Data Management Resources | More | 
Introduction to Scientific Visualization | tore | 
Steps for FAIR Data Principles August 2009 


Simplifying the Reuse and Interoperability of Hydrologic Data Sets and Models with Semantic Metadata that is | More ~ | 
Human-Readable & Machine-Actionable 


May 2017 
Keywords 


Figure 2. The Data Management Training Clearinghouse website demonstrating the filter alignment with the FAIR 
Guiding Principles. 


Adoption: Awareness needs to be coupled with adoption for change across our broad ecosystem to 
occur. For the Enabling FAIR Data project we track the number of signatories and continue to engage with 
those that are working toward becoming a signatory making sure that all their questions and concerns are 
addressed. The Enabling FAIR Data project has over 150 signatories [23] with strong participation from the 
Earth, space and environmental science journal and repository communities. 
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The most recent chemistry data community workshop focused on devising “FAIR Publishing Guidelines 
for Spectral Data and Chemical Structures”, funded by the National Science Foundation (NSF) and held in 
conjunction with the Spring 2019 ACS meeting in Orlando, Florida [24]. A preliminary survey of current 
chemical data publishing requirements revealed a mix of digital and analog practices with generally little 
guidance on preparing data [25]. The workshop brought together domain publishers, databases, repositories, 
software developers, researchers, librarians, standards organizations and data initiatives to draft practical 
workflows and community-wide value propositions for publishing these common chemistry data types in 
a more FAIR enabled manner. Planning for a pilot is underway among several chemistry publishers. 


7. OPPORTUNITIES FOR COLLABORATION 


Chemistry is one of the fundamental physical sciences and has branches in many related scientific 
disciplines. By creating a foundational set of interoperable vocabularies and standards in the Periodic Table 
and other fundamental chemistry standards, groups like the geosciences can leverage such standards and 
vocabularies in their own fields [9]. Chemistry standards can be used in combination with relevant standards 
that describe the geological samples that were analyzed (e.g., controlled vocabularies from the Commission 
for the Management and Application of Geoscience Information (CGI) of the IUGS [26] on lithology, 
composition, alteration, etc., or the International Mineralogical Association (IMA) list of standardized names 
of mineral species [27]). 


2019 is the International Year of the Periodic Table and an area of joint need for the geoscience and 
chemistry communities is access to the evaluated elemental data that underlies the table [28]. The IUPAC 
Commission on Isotopic Abundances and Atomic Weights (CIAAW) is dedicated to making these data as 
open and accessible as possible [29, 30]. A preliminary implementation pilot is underway to disseminate 
these data along with other authoritative agency sources in machine accessible form through the PubChem 
data framework at the US National Institutes of Health [31]. Additional initiatives are also underway in 
IUPAC to standardize more types of digital data formats and descriptive metadata for chemical 
characterization, such as spectroscopic data. 


Sharing common chemical standards across these multiple disciplines will ultimately facilitate 
interdisciplinary and transdisciplinary science. As the recognized authority in chemical nomenclature and 
representation, IUPAC is keen to gain an understanding of the how and where other disciplines are 
representing chemicals in their data workflows and metadata schema. To further map and analyze the 
chemical landscape more broadly, a challenge area for IUPAC, ChIN, the RDA Chemistry Interest Group, 
and other collaborators, will be to survey the collective data space in many fields for references to chemical 
information. The geoscience data community holds a wealth of data with diverse interests in chemical data 
standards and vocabularies — shall we start a joint initiative on “What is a Chemical”? 
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8. CONCLUSION 


The aim of this article has been to articulate how we see current data sharing practices within these 
communities, and to compare and contrast with a view toward highlighting similarities, differences and 
shared challenges. Hopefully this might provide a launching pad for newer initiatives to identify how they 
can enrich and complement existing community activities and ensure their endorsement by the relevant 
International Science Union and/or appropriate professional body. 
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